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Proteome analysis is most commonly accomplished by a combina- 
tion of two-dimensional gel electrophoresis (2DE) to separate and 
visualize proteins and mass spectrometry (MS) for protein identi- 
fication. Although this technique is powerful, mature, and sensi- 
tive, questions remain concerning its ability to characterize all of 
the elements of a proteome. In the current study, more than 1,500 
features were visualized by silver staining a narrow pH range 
(4.9-5.7) 2D gel in which 0.5 mg of total soluble yeast protein was 
separated. Fifty spots migrating to a region of 4 cm 2 were sub- 
jected to MS protein identification. Despite the high sample load 
and extended elect rophoretic separation, proteins from genes 
with codon bias values of <0.1 (lower abundance proteins) were 
not found, even though fully one-half of all yeast genes fall into 
that range. Proteins from genes with codon bias values of <0.1 
were found, however, if protein amounts exceeding the capacity 
of 2DE were fractionated and analyzed. We conclude that the large 
range of protein expression levels limits the ability of the 2 DE-MS 
approach to analyze proteins of medium to low abundance, and 
thus the potential of this technique for proteome analysis is 
likewise limited. 

The genomics revolution has changed the paradigm for the 
comprehensive analysis of biological processes and systems. 
It is now hypothesized that biological processes and systems can 
be described based on the comparison of global, quantitative 
gene expression patterns from cells or tissues representing 
different states. To test this hypothesis, it is essential that 
methods for the precise measurement of gene expression be 
developed and applied. 

Several methods, including serial analysis of gene expression, 
oligonucleotide and cDNA microarrays, and large-scale se- 
quencing of expressed sequence tags have been developed to 
globally and quantitatively measure gene expression at the 
mRNA level (1, 2). The discovery of posttranscriptional mech- 
anisms that control rate of synthesis and half-life of proteins (3) 
and the ensuing nonpredictive correlation between mRNA and 
protein levels expressed by a particular gene (4, 5) indicate that 
direct measurement of protein expression also is essential for the 
analysis of biological processes and systems. 

Global analysis of gene expression at the protein level is now 
also termed proteomics. The standard method for quantitative 
proteome analysis combines protein separation by high- 
resolution (isoelectric focusing/SDS-PAGE) two-dimensional 
gel electrophoresis (2DE) with mass spectrometric (MS) or 
tandem MS (MS/MS) identification of selected protein spots. 
Important technical advances related to 2DE and protein MS 
have increased sensitivity, reproducibility, and throughput of 
proteome analysis while creating an integrated technology. 

By using 2DE with extended pH range and high-sensitivity 
protein identification by electrospray ionization and MS/MS, we 
have evaluated the potential of the 2DE-MS strategy to serve as 
the technology base for comprehensive and quantitative pro- 
teome analysis. 

Materials and Methods 

Yeast Strain and Growth Conditions/The source of protein for all 
experiments was yeast strain YPH499 (MAT* ura3-52 Iys2-801 



ade2-101 leu2-l his3-200 trpl-63) (6). Cells were grown to log 
phase (2 x 10 7 cells/ml) in yeast extract/peptone rich medium 
with 2% galactose at 30°C. Protein was harvested as described by 
Garrels and coworkers (7). Harvested protein was lyophilized, 
resuspended in isoelectric focusing gel rehydration solution, and 
stored at -80°C. 

Preparation of Narrow-Range Immobilized pH Gradient (nrlPG) Strips. 

nrlPGs were cast in a U-frame on a GelBond PAG film 
according to nomograms (application note 324, Amersham 
Pharmacia Biotech). The pH gradient was poured in a 125 X 
260 X 0.5-mm frame resulting in gradient pH strips of approx- 
imately 25 cm in length. For casting a 4.5-5.5 pH gradient, the 
following acidic and basic solutions were mixed. Acidic solution: 
1 ml of acrylamide (30% T/3% C), 0.358 ml of Immobiline pK 
4.6, 0.163 ml of Immobiline pK 9.3, 2.15 ml of 87% glycerol, 7.5 
jil of N//^v"^v"-tetramethylethylenediamine (TEMED), and 7.5 
jil of ammonium persulfate (equilibrated to 7.5 ml of total 
volume). Basic solution: 1 ml of acrylamide (30% T, 3% C), 0.657 
ml of Immobiline pK 4.6, 0.604 ml of Immobiline pK 9.3, 0.8 ml 
of 87% glycerol, 7.5 fx\ of TEMED, and 7.5 fx\ of ammonium 
persulfate (equilibrated to 7.5 ml of total volume). The pH of the 
basic solution was neutralized with 4 M HC1 before polymer- 
ization. After a 16-h polymerization at 37°C, and drying, ap- 
proximately 0.3 mm wide nrlPG strips were cut. 

2DE and Protein Identification. Protein (500 p.g) was mixed with 
IPG rehydration buffer (8 M urea/2% NP-40/10 mM DTT; final 
volume = 360 /xl). The strips were allowed to rehydrate over- 
night, focused, equilibrated, apposed to the second dimension 
(10%) gels, and run as described (8). Gels were silver stained (9) 
and dried between two sheets of cellophane. A 4-cm 2 region of 
the gel was arbitrarily selected, and 50 protein spots were excised 
and subjected to in-gel tryptic digestion (9). Extracted peptides 
were stored at -20°C until analysis by microcapillary liquid 
chromatography (LQ-MS/MS as described (5, 10, 11). MS/MS 
spectra were searched automatically against the yeast protein 
database by using sequest software (12). Multiple peptides 
from each protein generally were detected, adding confidence to 
the protein identifications. 

Identification of Low Abundance Proteins. Soluble yeast protein was 
harvested as described above. Fifty milligrams of protein in 
loading buffer (75 mM Tris buffer, pH 6.8/10% glycerol/0.5% 
SDS/0.01% bromophenol blue) was loaded into a single large 
well (10 cm in length) of a 10% polyacrylamide gel slab (150 X 
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Fig. 1. (A) Narrow pH range isoelectric focusing 2D gel (pH 4.9-5.5). Soluble yeast protein (500 ng) was loaded onto the gel. More than 1.500 features were 
visible by silver staining. (S) An arbitrary 4-cm 2 region of the gel was selected for analysis. Numbers show the 50 spots that were identified by the MS techniques 
described in the text. 
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Table 1. Proteins identified in selected area from Fig. 1B 



Table 1. Continued. 
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*Gene names, predicted molecular masses, pi values, and codon bias values 
are from the Yeast Protein Database (YPD) (13). 



120 X 15 mm). The gel was run at 40 mA constant and stained 
with colloidal Coomassie blue. A strip (3 mm x 100 mm) 
representing proteins with molecular masses of ~68-85 kDa was 
cut from the gel. The gel strip was cut into 1-mm 3 pieces, which 
were subjected to in-gel digestion with trypsin (9). Extracted 
peptides were lyophilized, resolubilized in buffer A (5 mM 
KH 2 P0 4 /25% acetonitrile, pH 3.0), loaded onto a polysulfoethyl 
A column (2.1 X 250 mm; PolyLC, Columbia, MD), and eluted 
with buffer B (same as buffer A with 250 mM KC1 added) in a 
60-min linear gradient from 0 to 100% B at a flow rate of 200 
pl/min. Fractions were collected every minute. Five fractions 
(numbers 10, 15, 20, 25, and 30) were reduced in volume from 
200 /il to 50 ul, and 5 /xl of each sample was independently 
analyzed by microcapillary LC-MS/MS. Gradients for on-line 
LC-MS were extended to 2 h, and one MS scan was followed by 
four MS/MS scans on the four most-intense peptide ions. More 
than 4,000 MS/MS spectra were collected in each 2-h run. 
sequest analysis was performed against the yeast database 
without the tryptic constraint, and a peptide sequence was 
considered to be a match if (/) the cross-correlation score was 
greater than 2.0; («) the ends were tryptic, and (i«) the molecular 
mass of the matched protein was between 68 and 85 kDa. 

Results 

Analysis of Yeast Proteins by nrlPG-2DE. A standard 2D gel with an 
isoelectric focusing range of pH 3-10 and a protein load of 50 fig 
results in a gel with more than 1,000 features visualized by silver 
staining (5). By narrowing the pH range from 7 to 1 unit/20 cm 
gel and by increasing the protein load to 500 p,g, more than 1,500 
spots could be visualized on a single silver-stained gel (Fig. L4). 
By extrapolation, if seven 2D gels were run, each with a 
connective 1 pH unit range, more than 10,000 features might be 
visualized from a yeast cell lysate. This number of features 
surpasses the number of proteins predicted to be potentially 
expressed in yeast (6,139 proteins) (13). To characterize the 
proteins detected by nrIPG-2DE, an arbitrary region of 4 cm 2 
was chosen from the gel (Fig. IB), and the proteins migrating to 
50 spots in the selected area were identified by MS techniques 
(11, 14) (Table 1). 

Evaluation of Protein Abundance Based on Codon Bias Value. The 

codon bias value for a gene is its propensity to use only one of 
several codons to incorporate a specific amino acid into the 
polypeptide chain (5, 15, 16). It is known that more highly 
expressed proteins have large codon bias values (>0.2). The 
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Fig. 2. Codon bias value distributions in the yeast genome and on a narrow-range isoelectric focusing gel. (A) The entire yeast genome. (B) Genes predicted 
to run within selected area based on pi (5.25-5.45) and molecular mass (38-54 kDa). (O Unique genes detected in selected area. (D) Genes both predicted and 
found. Codon bias value is a predictor of protein abundance with lower abundance proteins generally arising from genes with codon bias values <0.2. No low 
abundance proteins were detected with codon bias values <0.1 even though more than one-half of the proteins in yeast fall into this category. 



proteins identified in. the selected area were examined with 
respect to the codon bias values of their respective genes. Fig. 24 
shows the distribution of codon bias values (codon bias distri- 
bution; CBD) for the entire yeast genome with the largest 
number of genes falling in the 0.0-0.1 category. Fig. IB shows 
the CBD for genes predicted from the yeast genome to be 
present in the selected gel region based on theoretical pi and 
predicted molecular mass values. There are 58 proteins predicted 
to run within the selected area, and their CBD is similar to the 
distribution of the entire genome. Fig. 2C shows the CBD for the 
genes identified from the selected gel area. No protein with a 
codon bias value of <0.1 was found, even though more than 
one-half of yeast genes fall into this range. Fig. 2D shows the 
CBD of the genes both predicted and found in the gel region. 
Only 14 of the 58 proteins were found within the area. Addi- 
tionally, no protein was found that had a codon bias value <0.1 
even though 10 of these proteins (i.e., YPL053C, YLL006W, 
YNL330C, YMR270C, YGL100W, YGL213C, YMR028W, 
YDR400W, YBR246W, and YDL024C) are likely expressed 
because mRNA expression has been reported (6). These data 
indicate that proteins of lesser abundance were not detected and 
that a larger number of proteins migrated to the selected region 
of the gel than were predicted. 

Evaluation of Comigrating Proteins and Differential Migration Prod- 
ucts. Products expressed from a single gene can migrate to 
multiple spots on 2D gels for a variety of reasons, including 
differential protein processing and posttranslational or artifac- 
tual modifications. In this study, we encountered numerous 
examples of this behavior (Table 1). The protein products from 
multiple genes also can run to the same coordinates on a gel. 
Surprisingly, several occurrences of comigrating proteins were 



found when using the nrlPGs for 2DE. For example, the protein 
products of six different genes all migrated to a faintly silver- 
stained spot (Table 2). Both differential migration and emi- 
gration of proteins complicate comparative, quantitative pattern 
analyses of 2D gel databases. 

Large Starting Amounts Are Required to Analyze Low Abundance 
Proteins. Because no proteins from genes with codon bias values 
of <0.1 were detected in this study, we attempted to detect low 
abundance proteins by increasing the amount of starting protein 
beyond the capacity of 2D gels. Fifty milligrams of protein was 
separated by one-dimensional SDS-PAGE, and a 3 mm X 10 cm 
strip was cut out of the Coomassie-stained gel. Peptides recov- 
ered after tryptic digestion of this sample were separated by 
strong cation exchange chromatography and selected fractions 
were analyzed by reverse-phase microcapillary LC-MS/MS. The 
number of proteins detected between 68 and 85 kDa was 193. 
This is more than one-third of the proteins predicted to migrate 
to the selected region of the gel (Fig. 3). The CBD of the genes 
detected was similar, but not identical, to the CBD for the entire 
genome, and many proteins with codon bias values <0.1 were 
detected. 

These results indicate that low abundance proteins can be 
analyzed if larger starting amounts of proteins are used. Table 3 
shows the calculated effect of larger starting loads on visualizing 
individual low abundance proteins. Calculated data for proteins 
present at 1,000, 100, and 10 copies/cell represent relatively low 
abundance proteins compared to proteins present at lO^-lO 6 
copies/cell. Indeed, when previously using standard 2DE tech- 
niques and 40 \x% of protein load of whole yeast lysate, the 
average protein abundance detected was 51,200 copies/cell and 
no proteins were detected with abundances < 1,000 copies/cell 
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Table 2. Proteins comigrating in a single silver-stained spot on a 
2D gel (spot 35 from Fig. 1B) 

Molecular 
pi* mass, kDa 



300 



Gene 
name* 



Peptide sequences identified* 



0DH1 



RPT3 
SUB2 



TIF3 



VMA1 
YFR044C 



(K)VIELGGTWSLSDSK 
(K)FIAEGSNMGSTPEAIAVFETAR 

(R)EIGYLFGAYR 5.50 49.6 

(K)VLPIVSVPER 

(R)ENAPSIIFIDEVDSIATK 5.32 47.9 

(R)DVQEIFR 
(K)LTLHGLQQYYIK 

(R)INLAINYDLTNEADQYLHR 5.36 50.3 

(K) N KDTAPH I W ATPG R 
(R)FLQNPLEIFVDDEAK 
(R)GSNFQGDGREDAPDLDWGAAR 
(R)ADLVAVLK 

(K)ITIPIETANANTIPLSELAHAK 5.17 48.5 

(R)EREEVDIDWTAAR 
(R)EREEPDI D WS AAR 

(K)VGHDNLVGEVIR 5.09 67.7 

(R)YPSLSIHGVEGAFSAQGAK 
(K)LVYGVDPDFTR 

(K)FISEQLSQSGFHDIK 5.54 52.9 

(R)TELIHDGAYVWSDPFNAQFTAAK 

(K)ILIDGIDEMVAPLTEK 

*Gene names, predicted pi, and molecular mass values are from the Yeast 

Proteome Database (YPD) (13). 
♦Peptide sequences were identified automatically and verified manually by 

using sequest (12). 



(5). With 0.5 mg of starting protein, proteins present at 1,000 
copies/cell could be visualized by silver staining, but those 
present at 100 and 10 copies/cell could not be visualized (Table 
3). To achieve meaningful comparative and global protein 
expression profiles of different cell states by 2DE, prefraction- 
ation of milligram amounts of protein before 2DE is essential. 
However, it is likely that during preelectrophoretic protein 
fractionation, the ability for accurate protein quantification 
becomes compromised. 

Discussion 

The ability to identify proteins separated by 2DE has resulted in 
the analysis of many hundreds of spots. However, in studies in 
which total cell yeast lysates were separated by 2DE and the 
resulting spots were identified, only abundant proteins have been 
detected (5, 16-18). In this study, we have systematically exam- 
ined the potential of the 2DE-MS/MS technology to provide 
global protein expression profiles from unseparated yeast cell 
lysates. The results indicate that the method is unsuited for the 
analysis low abundance proteins and that statements about the 
feasibility and straightforwardness of proteome analysis based 
on 2DE-MS should be rethought (17). 

The focus of this study was a 2D gel with a 1 pH unit isoelectric 
focusing range. More than 1,500 spots could be visualized by 
silver staining. By extrapolation, a full complement of 1 pH unit 
gels might contain as many as 10,500 features (1,500 spots X 7 
pH units). This greatly exceeds the 6,139 proteins predicted from 
the yeast genome (13). It might be concluded based on spot 
counting alone that 2D gels could be the basis for global 
proteome analysis. However, analysis of protein spots by MS 
identified generally abundant proteins (codon bias values >0.2). 
Clearly the number of spots on a 2D gel is not representative of 
the overall number or classes of expressed genes that can be 
analyzed. 

Differential protein processing (producing more than one spot 
per protein) and comigrating spots present problems for both 
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Fig. 3. Low abundance proteins can be detected if the starting protein load 
is large. Fifty milligrams of soluble yeast protein was separated by SDS/PAGE 
in a single 10-cm wide lane. A band corresponding to a molecular mass range 
of 68 to 85 kDa was excised and in-gel trypsinized, and peptides were sepa- 
rated by strong cation exchange chromatography followed by on-line sepa- 
ration and analysis by LC-MS/MS techniques. Proteins in the gel band were 
identified from MS/MS spectra by the computer program sequest(12). {A) 
Distribution of codon bias for genes predicted to be between 68 and 85 kDa 
in molecular mass. This is a similar pattern as that shown in A which shows the 
distribution for all yeast genes. (8) Distribution of codon bias for genes 
identified within the gel band. Many proteins from genes with low codon bias 
values (<0.1) were identified, including three protein kinases and a transcrip- 
tion factor. 



quantitative protein expression comparison and database match- 
ing. We detected as many as six proteins comigrating to the same 
faintly silver-stained spot (Table 2). In addition, many proteins 
were detected in multiple spots within the gel region selected for 
analysis (Table 1), and it is likely that other forms of the same 
proteins migrated to spots outside of the selected area. With 
protein levels in yeast cells being expressed over a range of at 
least 5 orders of magnitude, small populations of differentially 



Table 3. Theoretical required total starting protein amounts for 
individual protein visualization by staining 





Silver staining* 


Coomassie staining* 




Protein 


Number 


Protein 


Number 


Protein abundance. 


amount. 


of 


amount. 


of 


copies per cell 


mg f 


cells 


mg* 


cells 


10 


20.073 


1.20 x 10 9 


2007.3 


1.20 x 10" 


100 


2.007 


1.20 x ins 


200.7 


1.20 x 10 10 


1,000 


0.201 


1.20 x 10 7 


20.1 


1.20 x 10* 


10,000 


0.020 


1.20 X 10 6 


2.0 


1.20 x 10 s 


100,000 


0.002 


1.20 x 10 s 


0.2 


1.20 x 10 7 



♦Protein detection limits for silver and Coomassie staining were 1 and 100 ng, 
respectively, 

f Solubie yeast protein was calculated based on 1 mg of yeast protein being 
derived from harvesting 6 x 10 7 cells. Calculations are based on a protein 
molecular mass of 50 kDa and 100% efficiencies of the procedures used. 
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processed or posttranslationally modified forms of abundant 
proteins compete for localization in the gel with lower abun- 
dance proteins and greatly increase the overall number of 
features visible on a 2D gel. 

There are 61 possible codons that code for 20 amino acids. 
Codon bias is a measure of the propensity of an organism to 
selectively use certain codons that result in the incorporation of 
the same amino acid residue in a growing polypeptide chain. The 
larger the codon bias value, the fewer the number of codons that 
are used to encode the protein (15). It is thought that codon bias 
is a measure of protein abundance because highly expressed 
proteins generally have large codon bias values (16). We have 
shown previously that codon bias appears to be an excellent 
indicator of the boundaries of current 2D gel proteome analysis 
technology (5). There are thousands of yeast genes with ex- 
pressed mRNA (6) and likely expressed protein with codon bias 
values <0.1 (Fig. 2A). In this study, we detected none of them 
by a 2D gel-based approach (Fig. 2C). Indeed, in every examined 
yeast proteome study (5, 16-18) in which the combined total 
number of identified proteins is >400, this same observation is 
true. It is expected that for the more complex cells of higher 
eukaryotic organisms the detection of low abundance proteins 
would be even more challenging than in yeast. Clearly, highly 
abundant proteins are overwhelmingly detected in proteome 
studies. If proteome analysis is to provide truly meaningful 
information about cellular and regulatory processes, it must be 
able to penetrate to the level of regulatory proteins, which are 
typically of low abundance. 

In this study, we examined the use of 2D gels with extended 
separation range and increased sample load. The data indicated 
that there was a somewhat improved detection of proteins of 
moderate abundance but that low abundance proteins charac- 
terized by codon bias values <0.1 (more than one-half of the 
genes in the yeast genome) were still generally undetectable. We 
therefore conclude that the current proteome technology, used 
without sample preenrichment, is not suitable for the global 
detection of proteins expressed by the cell and that the con- 



struction of complete, quantitative proteome maps based on the 
2DE-MS/MS approach will be very challenging, even for rela- 
tively simple, unicellular organisms. 

Low abundance proteins were, however, detected if larger 
starting amounts were used. Beginning with 50 mg of total yeast 
protein and by using a strategy that included SDS/PAGE, in-gel 
digestion, strong cation exchange chromatography separation, 
and on-line microcapillary LC-MS/MS techniques, we success- 
fully detected 193 proteins with more than 60 proteins from 
genes with codon bias values of <0.1 including three protein 
kinases (YNL020C, YLR248W, and YMR216C) and one tran- 
scription factor (YML007W). These data support the need to 
develop strategies that allow for large starting amounts of 
protein to provide at least femtomole amounts of each protein 
to be delivered to the MS (Table 3). The most favorable 
techniques currently use multiple chromatographic steps such as 
the pairing of strong cation exchange and reverse-phase chro- 
matography in this study (19). 

Conclusions 

The 2D-gel-based proteome analysis has been successfully used 
to detect and characterize marker proteins that are idiotypic for 
a specific physiologic or pathologic state of a cell or tissue. 
However, it is now apparent that the 2DE-MS/MS approach is 
unsuitable to detect, identify, and quantify every protein in a 
sample, a task that seems necessary for the comprehensive 
analysis and eventual mathematical description of biological 
processes and systems. For this reason, it is necessary to develop 
novel techniques that allow for much increased starting amounts 
while permitting large-scale quantitative comparison of protein 
expression. 
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